Accurate identification of polyadenylation sites from 3′ end deep sequencing using a naïve Bayes classifier

نویسندگان

  • Sarah Sheppard
  • Nathan D. Lawson
  • Lihua Julie Zhu
چکیده

MOTIVATION 3' end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3' ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences. RESULTS By analyzing sequence features flanking 3' ends derived from oligo-dT-based sequencing, we developed a naïve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate identification of polyadenylation sites from 30 end deep sequencing using a naı̈ve Bayes classifier

Motivation: 30 end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 30 ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic fil...

متن کامل

Application of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation

Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated r...

متن کامل

PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. ...

متن کامل

Groundwater Potential Mapping using Index of Entropy and Naïve Bayes Models at Ardabil Plain

Although groundwater resources have long been selected as a safe choice for resolving human water requirements, overexploitation of them, especially at Ardabil plain, has promoted a decrease in the quality and quantity of these resources. One of the significant solutions is to identification of the groundwater potential zones and exploitation of them according to their potentials. The aim of th...

متن کامل

Image Classification Using Naïve Bayes Classifier

An image classification scheme using Naïve Bayes Classifier is proposed in this paper. The proposed Naive Bayes Classifier-based image classifier can be considered as the maximum a posteriori decision rule. The Naïve Bayes Classifier can produce very accurate classification results with a minimum training time when compared to conventional supervised or unsupervised learning algorithms. Compreh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 29 20  شماره 

صفحات  -

تاریخ انتشار 2013